Adaptive and hybrid context-aware fine-grained word sense disambiguation in topic modeling based document representation
نویسندگان
چکیده
Abstract We propose a hybrid context based topic model with an adaptive window length for word sense disambiguation in document representation. Document representation is essential part of various tasks, and to capture the distinctions senses Traditional methods mainly rely on knowledge libraries data enrichment; however, semantics division may vary different domain-specific datasets. aim discover finer-grained semantic differences, such as entities or standpoints, handle problem without enrichment. There are two challenges this task: (1) dividing each polysemous word, (2) preserving differences between synonyms. Most existing models either separate clusters integrating auxiliary module specify senses. They can hardly achieve both since assumed be independent their intrinsic relationships ignored. To solve problem, we introduce “Bag-of-Senses” (BoS) assumption: multiset senses, generated instead words. The estimated by which it occurs contexts its other occurrences. Besides, scopes related occurrence, variable adjust adaptively. Our experiments three standard datasets show that our proposal outperforms state-of-the-art terms estimation, modeling, classification.
منابع مشابه
Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets
The paper presents a method for word sense disambiguation based on parallel corpora. The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and being supported by available aligned wordnets for the languages in the corpus. The wordnets are aligned to the Princeton Wordnet, according to the principles established by Euro...
متن کاملA Topic Model for Word Sense Disambiguation
We develop latent Dirichlet allocation with WORDNET (LDAWN), an unsupervised probabilistic topic model that includes word sense as a hidden variable. We develop a probabilistic posterior inference algorithm for simultaneously disambiguating a corpus and learning the domains in which to consider each word. Using the WORDNET hierarchy, we embed the construction of Abney and Light (1999) in the to...
متن کاملImproving Word Sense Disambiguation Using Topic Features
This paper presents a novel approach for exploiting the global context for the task of word sense disambiguation (WSD). This is done by using topic features constructed using the latent dirichlet allocation (LDA) algorithm on unlabeled data. The features are incorporated into a modified naı̈ve Bayes network alongside other features such as part-of-speech of neighboring words, single words in the...
متن کاملKnowledge-based Word Sense Disambiguation using Topic Models
Word Sense Disambiguation is an open problem in Natural Language Processing which is particularly challenging and useful in the unsupervised setting where all the words in any given text need to be disambiguated without using any labeled data. Typically WSD systems use the sentence or a small window of words around the target word as the context for disambiguation because their computational co...
متن کاملDocument Clustering using Word Sense Disambiguation
In computational linguistics, word sense disambiguation (WSD) is the problem of determining in which sense a word having a number of distinct senses is used in a given sentence . This paper handles text document clustering as one of the major tasks of text processing. Document clustering is the process of finding out groups of information from the text documents and cluster these documents into...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Processing and Management
سال: 2021
ISSN: ['0306-4573', '1873-5371']
DOI: https://doi.org/10.1016/j.ipm.2021.102592